Continuum Armed Bandit Problem of Few Variables in High Dimensions

نویسندگان

  • Hemant Tyagi
  • Bernd Gärtner
چکیده

We consider the stochastic and adversarial settings of continuum armed bandits where the arms are indexed by [0, 1]. The reward functions r : [0, 1] → R are assumed to intrinsically depend on at most k coordinate variables implying r(x1, . . . , xd) = g(xi1 , . . . , xik ) for distinct and unknown i1, . . . , ik ∈ {1, . . . , d} and some locally Hölder continuous g : [0, 1] → R with exponent α ∈ (0, 1]. Firstly, assuming (i1, . . . , ik) to be fixed across time, we propose a simple modification of the CAB1 algorithm where we construct the discrete set of sampling points to obtain a bound of O(n α+k 2α+k (logn) α 2α+k C(k, d)) on the regret, with C(k, d) depending at most polynomially in k and sub-logarithmically in d. The construction is based on creating partitions of {1, . . . , d} into k disjoint subsets and is probabilistic, hence our result holds with high probability. Secondly we extend our results to also handle the more general case where (i1, . . . , ik) can change over time and derive regret bounds for the same.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stochastic Continuum Armed Bandit Problem of Few Linear Parameters in High Dimensions Hemant Tyagi, Sebastian Stich and Bernd Gärtner

We consider a stochastic continuum armed bandit problem where the arms are indexed by the l2 ball Bd(1+ν) of radius 1+ν in R . The reward functions r : Bd(1+ν) → R are considered to intrinsically depend on k ≪ d unknown linear parameters so that r(x) = g(Ax) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery l...

متن کامل

Stochastic continuum armed bandit problem of few linear parameters in high dimensions

We consider a stochastic continuum armed bandit problem where the arms are indexed by the l2 ball Bd(1+ν) of radius 1+ν in R . The reward functions r : Bd(1+ν) → R are considered to intrinsically depend on k ≪ d unknown linear parameters so that r(x) = g(Ax) where A is a full rank k × d matrix. Assuming the mean reward function to be smooth we make use of results from low-rank matrix recovery l...

متن کامل

Improved Rates for the Stochastic Continuum-Armed Bandit Problem

Considering one-dimensional continuum-armed bandit problems, we propose an improvement of an algorithm of Kleinberg and a new set of conditions which give rise to improved rates. In particular, we introduce a novel assumption that is complementary to the previous smoothness conditions, while at the same time smoothness of the mean payoff function is required only at the maxima. Under these new ...

متن کامل

Showing Relevant Ads via Context Multi-Armed Bandits

We study context multi-armed bandit problems where the context comes from a metric space and the payoff satisfies a Lipschitz condition with respect to the metric. Abstractly, a context multi-armed bandit problem models a situation where, in a sequence of independent trials, an online algorithm chooses an action based on a given context (side information) from a set of possible actions so as to...

متن کامل

Medoids in almost linear time via multi-armed bandits

Computing the medoid of a large number of points in high-dimensional space is an increasingly common operation in many data science problems. We present an algorithm Med-dit which uses O(n log n) distance evaluations to compute the medoid with high probability. Med-dit is based on a connection with the multi-armed bandit problem. We evaluate the performance of Med-dit empirically on the Netflix...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013